Fisher’s Exact Test (2×2) — Intuition + NumPy Implementation#

Fisher’s exact test answers a simple question:

Given a 2×2 contingency table, is there evidence that the two categorical variables are associated (not independent)?

It is especially useful when sample sizes are small (or expected counts are low), where large-sample approximations (like the chi-square test) can be unreliable.

What you’ll learn#

  • when Fisher’s exact test is the right tool

  • what “exact” means (conditioning on margins → hypergeometric distribution)

  • how the p-value is constructed for one-sided vs two-sided tests

  • a low-level NumPy-only implementation you can read end-to-end

  • how to interpret the result (and what it does not tell you)

Prerequisites#

  • basic probability (combinations)

  • null/alternative hypotheses + p-values

import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio

pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")
np.random.seed(42)

import sys
import plotly

print("python:", sys.version.split()[0])
print("numpy:", np.__version__)
print("plotly:", plotly.__version__)
python: 3.12.9
numpy: 1.26.2
plotly: 6.5.2

1) When to use Fisher’s exact test#

Use Fisher’s exact test when:

  • you have two categorical variables, each with two levels (a 2×2 table)

  • you want to test whether they are independent

  • counts are small (or expected counts are low), and you want an exact p-value

Common examples:

  • A/B tests: variant A vs B, conversion yes/no

  • clinical studies: treatment vs control, improved yes/no

  • survey analysis: group membership vs response category

Fisher’s exact test is valid for any sample size, but it’s most often chosen when the chi-square approximation is questionable.

2) The 2×2 table + hypotheses#

We’ll write the 2×2 table like this:

Outcome = 1

Outcome = 0

Group = 1

a

b

Group = 0

c

d

  • Null hypothesis (H₀): the variables are independent (equivalently, the odds ratio = 1)

  • Alternative (H₁) depends on the question:

    • greater: Group=1 has higher odds of Outcome=1 (odds ratio > 1)

    • less: Group=1 has lower odds of Outcome=1 (odds ratio < 1)

    • two-sided: any association (odds ratio ≠ 1)

# Example: treatment (1) vs control (0), success (1) vs failure (0)
treatment = np.array([1] * 10 + [0] * 6)
success = np.array([1] * 8 + [0] * 2 + [1] * 1 + [0] * 5)

a = int(np.sum((treatment == 1) & (success == 1)))
b = int(np.sum((treatment == 1) & (success == 0)))
c = int(np.sum((treatment == 0) & (success == 1)))
d = int(np.sum((treatment == 0) & (success == 0)))

table = np.array([[a, b], [c, d]], dtype=int)
table
array([[8, 2],
       [1, 5]])
row_labels = ["Treatment", "Control"]
col_labels = ["Success", "Failure"]

fig = px.imshow(
    table,
    text_auto=True,
    aspect="auto",
    x=col_labels,
    y=row_labels,
    color_continuous_scale="Blues",
    title="Observed 2×2 contingency table",
)
fig.update_layout(coloraxis_showscale=False)
fig.show()

n = table.sum()
expected = np.outer(table.sum(axis=1), table.sum(axis=0)) / n

fig = px.imshow(
    expected,
    text_auto=".2f",
    aspect="auto",
    x=col_labels,
    y=row_labels,
    color_continuous_scale="Greens",
    title="Expected counts under independence (H₀)",
)
fig.update_layout(coloraxis_showscale=False)
fig.show()

3) Effect size: the odds ratio#

The odds ratio (OR) is a common effect size for 2×2 tables:

\[\text{OR} = \frac{a\,d}{b\,c}\]

Interpretation:

  • OR = 1: no association (what H₀ asserts)

  • OR > 1: Group=1 is more likely to have Outcome=1 (positive association)

  • OR < 1: Group=1 is less likely to have Outcome=1 (negative association)

Fisher’s exact test gives you a p-value for the association. You usually report both the p-value and an effect size (like OR).

a, b, c, d = table.ravel()
num = a * d
den = b * c

odds_ratio = (num / den) if den != 0 else (np.inf if num > 0 else np.nan)
odds_ratio
20.0

4) What makes it “exact”: conditioning on the margins#

The key idea behind Fisher’s exact test is conditioning on the margins (row sums and column sums).

For a 2×2 table, if the margins are fixed:

  • row sums: \(r_1 = a+b\), \(r_2 = c+d\)

  • column sums: \(c_1 = a+c\), \(c_2 = b+d\)

  • total: \(n = r_1 + r_2\)

…then the whole table is determined by just one number: the top-left cell \(a\).

Under \(H_0\) (independence) and given the margins, the distribution of \(a\) is hypergeometric:

\[\mathbb{P}(A=a \mid r_1, c_1, n) = \frac{\binom{c_1}{a}\,\binom{c_2}{r_1-a}}{\binom{n}{r_1}}\]

So we can compute probabilities of all possible 2×2 tables with these same margins — exactly.

def log_factorials_upto(n: int) -> np.ndarray:
    """Return log(k!) for k=0..n as a NumPy array."""
    n = int(n)
    log_fact = np.zeros(n + 1, dtype=float)
    if n >= 1:
        log_fact[1:] = np.cumsum(np.log(np.arange(1, n + 1)))
    return log_fact


def hypergeom_pmf_for_a_values(
    a_values: np.ndarray,
    *,
    r1: int,
    c1: int,
    n: int,
    log_fact=None,
) -> np.ndarray:
    """Hypergeometric PMF for A (top-left cell) given fixed margins."""
    a_values = np.asarray(a_values, dtype=int)
    r1 = int(r1)
    c1 = int(c1)
    n = int(n)
    c2 = n - c1

    if log_fact is None:
        log_fact = log_factorials_upto(n)

    def log_choose(n_: int, k_: np.ndarray) -> np.ndarray:
        return log_fact[n_] - log_fact[k_] - log_fact[n_ - k_]

    log_p = (
        log_choose(c1, a_values)
        + log_choose(c2, r1 - a_values)
        - log_choose(n, np.array(r1, dtype=int))
    )

    # Stabilize with log-sum-exp via a shift
    log_p = log_p - np.max(log_p)
    p = np.exp(log_p)
    return p / np.sum(p)


a_obs = int(table[0, 0])
r1 = int(table[0, :].sum())
c1 = int(table[:, 0].sum())
n = int(table.sum())
c2 = n - c1

a_min = max(0, r1 - c2)
a_max = min(r1, c1)
a_values = np.arange(a_min, a_max + 1)

pmf = hypergeom_pmf_for_a_values(a_values, r1=r1, c1=c1, n=n)

np.column_stack([a_values, pmf])
array([[3.00000000e+00, 1.04895105e-02],
       [4.00000000e+00, 1.10139860e-01],
       [5.00000000e+00, 3.30419580e-01],
       [6.00000000e+00, 3.67132867e-01],
       [7.00000000e+00, 1.57342657e-01],
       [8.00000000e+00, 2.36013986e-02],
       [9.00000000e+00, 8.74125874e-04]])
def odds_ratio_for_a_values(a_values: np.ndarray, *, r1: int, c1: int, n: int) -> np.ndarray:
    """Compute odds ratios for all feasible tables with varying a and fixed margins."""
    a_values = np.asarray(a_values, dtype=int)
    r1 = int(r1)
    c1 = int(c1)
    n = int(n)
    r2 = n - r1

    b = r1 - a_values
    c = c1 - a_values
    d = r2 - c

    num = a_values.astype(float) * d
    den = b.astype(float) * c

    or_ = np.full_like(num, np.nan, dtype=float)
    mask = den != 0
    or_[mask] = num[mask] / den[mask]
    or_[~mask & (num > 0)] = np.inf
    return or_


or_values = odds_ratio_for_a_values(a_values, r1=r1, c1=c1, n=n)

colors = np.where(a_values == a_obs, "#111111", "#636EFA")
fig = go.Figure(
    go.Bar(
        x=a_values,
        y=pmf,
        marker_color=colors,
        customdata=np.column_stack([or_values]),
        hovertemplate="a=%{x}<br>P=%{y:.6f}<br>OR=%{customdata[0]:.3g}<extra></extra>",
    )
)
fig.add_vline(x=a_obs, line_color="#111111", line_dash="dash")
fig.update_layout(
    title="All feasible tables (fixed margins) → hypergeometric PMF for a",
    xaxis_title="a = count in the top-left cell",
    yaxis_title="Probability under H₀ (conditional on margins)",
)
fig.show()

5) From the PMF to a p-value#

Once we have the probability of every feasible table (given the margins), we can define “extreme” outcomes.

One-sided p-values#

  • greater: sum probabilities of tables with a ≥ a_obs (more evidence of positive association)

  • less: sum probabilities of tables with a ≤ a_obs (more evidence of negative association)

Two-sided p-value (common definition)#

For a two-sided test, Fisher’s exact test is discrete, so “two-sided” needs a precise definition.

A widely-used definition (including SciPy) is:

Sum probabilities of all tables whose probability is the observed table’s probability.

This produces a symmetric p-value that includes both tails.

p_obs = pmf[a_obs - a_min]
p_greater = float(pmf[a_values >= a_obs].sum())
p_less = float(pmf[a_values <= a_obs].sum())
p_two_sided = float(pmf[pmf <= p_obs + 1e-12].sum())

p_greater, p_less, p_two_sided
(0.024475524475524438, 0.9991258741258742, 0.03496503496503492)

6) Fisher’s exact test from scratch (NumPy-only)#

Below is a complete implementation of Fisher’s exact test for a 2×2 table.

  • It enumerates all feasible tables (via the feasible values of \(a\)).

  • It computes the hypergeometric probabilities in a numerically stable way.

  • It supports greater, less, and two-sided.

def fisher_exact_numpy(table: np.ndarray, alternative: str = "two-sided", return_details: bool = False):
    """Fisher's exact test for a 2x2 contingency table (NumPy-only).

    Parameters
    ----------
    table : array-like, shape (2, 2)
        Non-negative counts.
    alternative : {'two-sided', 'greater', 'less'}
        Defines the alternative hypothesis.
    return_details : bool
        If True, also return the enumerated support and PMF.

    Returns
    -------
    odds_ratio : float
    p_value : float
    details : dict (optional)
    """
    table = np.asarray(table, dtype=int)
    if table.shape != (2, 2):
        raise ValueError("table must be shape (2, 2)")
    if np.any(table < 0):
        raise ValueError("counts must be non-negative")

    a, b, c, d = table.ravel()
    r1 = int(a + b)
    r2 = int(c + d)
    c1 = int(a + c)
    c2 = int(b + d)
    n = int(r1 + r2)

    # Sample odds ratio (effect size)
    num = a * d
    den = b * c
    odds_ratio = (num / den) if den != 0 else (np.inf if num > 0 else np.nan)

    # Enumerate feasible values of a given fixed margins
    a_min = max(0, r1 - c2)
    a_max = min(r1, c1)
    a_values = np.arange(a_min, a_max + 1)

    log_fact = log_factorials_upto(n)
    pmf = hypergeom_pmf_for_a_values(a_values, r1=r1, c1=c1, n=n, log_fact=log_fact)
    p_obs = pmf[int(a - a_min)]

    alt = alternative.lower().replace("_", "-").strip()
    if alt in {"greater", "right", "right-sided", "right sided"}:
        p_value = float(pmf[a_values >= a].sum())
    elif alt in {"less", "left", "left-sided", "left sided"}:
        p_value = float(pmf[a_values <= a].sum())
    elif alt in {"two-sided", "two sided"}:
        p_value = float(pmf[pmf <= p_obs + 1e-12].sum())
    else:
        raise ValueError("alternative must be 'two-sided', 'greater', or 'less'")

    p_value = float(min(p_value, 1.0))

    if not return_details:
        return odds_ratio, p_value

    details = {
        "a_values": a_values,
        "pmf": pmf,
        "a_obs": int(a),
        "p_obs": float(p_obs),
        "margins": {"r1": r1, "r2": r2, "c1": c1, "c2": c2, "n": n},
    }
    return odds_ratio, p_value, details


for alt in ["greater", "less", "two-sided"]:
    or_, p_ = fisher_exact_numpy(table, alternative=alt)
    print(f"{alt:>9} | odds ratio = {or_:>6.3g} | p-value = {p_:.6f}")
  greater | odds ratio =     20 | p-value = 0.024476
     less | odds ratio =     20 | p-value = 0.999126
two-sided | odds ratio =     20 | p-value = 0.034965
# Optional: verify against SciPy (if installed)
try:
    from scipy.stats import fisher_exact

    for alt in ["greater", "less", "two-sided"]:
        or_scipy, p_scipy = fisher_exact(table, alternative=alt)
        or_np, p_np = fisher_exact_numpy(table, alternative=alt)
        print(f"{alt:>9} | scipy p={p_scipy:.6f} | numpy p={p_np:.6f} | scipy OR={or_scipy:.3g}")
except Exception as e:
    print("SciPy check skipped:", e)
  greater | scipy p=0.024476 | numpy p=0.024476 | scipy OR=20
     less | scipy p=0.999126 | numpy p=0.999126 | scipy OR=20
two-sided | scipy p=0.034965 | numpy p=0.034965 | scipy OR=20

7) Visualizing “extreme” tables (greater / less / two-sided)#

The plots below show which feasible tables are counted in the p-value.

  • Gray bars: feasible tables that are not counted

  • Red bars: feasible tables that are counted for that alternative

  • The vertical dashed line marks the observed value \(a_{obs}\)

def plot_pmf_with_rejection_region(details: dict, *, alternative: str) -> go.Figure:
    a_values = details["a_values"]
    pmf = details["pmf"]
    a_obs = details["a_obs"]
    p_obs = details["p_obs"]
    r1 = details["margins"]["r1"]
    c1 = details["margins"]["c1"]
    n = details["margins"]["n"]

    or_values = odds_ratio_for_a_values(a_values, r1=r1, c1=c1, n=n)

    alt = alternative.lower().replace("_", "-").strip()
    if alt in {"greater", "right", "right-sided", "right sided"}:
        mask = a_values >= a_obs
        p_value = float(pmf[mask].sum())
        title = f"Fisher exact ({alternative}): p-value = {p_value:.6f}"
    elif alt in {"less", "left", "left-sided", "left sided"}:
        mask = a_values <= a_obs
        p_value = float(pmf[mask].sum())
        title = f"Fisher exact ({alternative}): p-value = {p_value:.6f}"
    elif alt in {"two-sided", "two sided"}:
        mask = pmf <= p_obs + 1e-12
        p_value = float(pmf[mask].sum())
        title = f"Fisher exact ({alternative}): p-value = {p_value:.6f}"
    else:
        raise ValueError("alternative must be 'two-sided', 'greater', or 'less'")

    colors = np.where(mask, "#EF553B", "#B0B0B0")

    fig = go.Figure(
        go.Bar(
            x=a_values,
            y=pmf,
            marker_color=colors,
            customdata=np.column_stack([or_values]),
            hovertemplate="a=%{x}<br>P=%{y:.6f}<br>OR=%{customdata[0]:.3g}<extra></extra>",
        )
    )
    fig.add_vline(x=a_obs, line_color="#111111", line_dash="dash")
    fig.update_layout(
        title=title,
        xaxis_title="a (top-left cell)",
        yaxis_title="Probability under H₀ (conditional on margins)",
    )
    return fig


_, _, details = fisher_exact_numpy(table, return_details=True)

for alt in ["greater", "less", "two-sided"]:
    plot_pmf_with_rejection_region(details, alternative=alt).show()

8) How to interpret Fisher’s exact test#

What the p-value means here#

With Fisher’s exact test, the p-value is:

The probability (under H₀: independence) of observing a table at least as extreme as the one you saw, given that the margins are fixed.

So:

  • a small p-value suggests the observed association would be rare under independence → evidence against H₀

  • a large p-value means the data are not surprising under independence → not enough evidence to reject H₀

What it does not mean#

  • It does not say “the probability H₀ is true”.

  • It does not tell you the size of the association (use OR / risk ratio / risk difference for that).

A good report typically includes:

  • the 2×2 table

  • the odds ratio (effect size)

  • the p-value (and the chosen alternative)

9) A helpful sanity check: p-values under the null are discrete#

Because Fisher’s exact test works with a discrete distribution (the hypergeometric), the set of possible p-values is discrete.

If we repeatedly sample tables from the null (with the same fixed margins) and compute the two-sided p-values, you’ll see spikes rather than a perfectly uniform distribution. The test remains valid (it is typically conservative).

r1 = details["margins"]["r1"]
c1 = details["margins"]["c1"]
n = details["margins"]["n"]
c2 = n - c1

a_values = details["a_values"]
pmf = details["pmf"]

two_sided_p_for_each_a = np.array([float(pmf[pmf <= p_i + 1e-12].sum()) for p_i in pmf])

n_sims = 20_000
a_sim = np.random.hypergeometric(ngood=c1, nbad=c2, nsample=r1, size=n_sims)
p_sim = two_sided_p_for_each_a[a_sim - a_values.min()]

alpha = 0.05
print("Pr(reject at alpha=0.05) under H0 (empirical):", float(np.mean(p_sim <= alpha)))

fig = px.histogram(
    p_sim,
    nbins=30,
    title="Two-sided Fisher exact p-values under H₀ (fixed margins)",
    labels={"value": "p-value"},
)
fig.add_vline(x=alpha, line_color="#EF553B", line_dash="dash")
fig.show()
Pr(reject at alpha=0.05) under H0 (empirical): 0.03565

10) Pitfalls + practical notes#

  • Conditional on margins: Fisher’s test conditions on fixed row/column totals. In some study designs (e.g., case–control), margins are naturally fixed; in others they aren’t, but the test is still commonly used.

  • Two-sided definition: multiple “two-sided Fisher” definitions exist. Always specify which one you use (this notebook uses the common “probability ≤ observed probability” rule).

  • Zeros → infinite OR: if a cell is 0, the sample odds ratio can be 0 or ∞. That’s not “wrong”, but interpret carefully and consider reporting confidence intervals with appropriate methods.

  • p-value vs effect size: a tiny p-value can correspond to a small effect with large n; a large p-value can occur with a large OR but tiny n. Always look at the table and an effect size.

  • Multiple testing: if you run many Fisher tests, adjust for multiple comparisons.

11) Exercises#

  1. Pick a different 2×2 table and compute Fisher’s exact p-values for greater, less, and two-sided.

  2. Change the margins (row/column totals) while keeping the odds ratio similar — how does the p-value change?

  3. For fixed margins, compute the PMF and plot it; identify which tables contribute to the two-sided p-value.

References#

  • Fisher, R. A. (1922). On the interpretation of χ² from contingency tables.

  • Hypergeometric distribution: see any standard probability text.

  • SciPy: scipy.stats.fisher_exact (for a reference implementation).